The Potential of Automatic Word Comparison for Historical Linguistics

نویسندگان

Johann-Mattis List

Simon J Greenhill

Russell D Gray

چکیده

The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Do We Need Discipline-Specific Academic Word Lists? Linguistics Academic Word List (LAWL)

This corpus-based study aimed at exploring the most frequently-used academic words in linguistics and compare the wordlist with the distribution of high frequency words in Coxhead’s Academic Word List (AWL) and West’s General Service List (GSL) to examine their coverage within the linguistics corpus. To this end, a corpus of 700 linguistics research articles (LRAC), consisting of approximately ...

متن کامل

Word-Forming Process in Azeri Turkish Language

The subject intended to study the general methods of natural word-forming in Azeri Turkish language. This study aimed to reach this purpose by analyzing the construction of compound Azeri Turkish words. Same’ei (2016) did a comprehensive study on word-forming process in Farsi, which was the inspiration source of this study for Azeri Turkish language word-forming. Numerous scholars had done vari...

متن کامل

Development and Validation of a Persian Version of Dichotic Emotional Word Test

Introduction: Emotional words in comparison with neutral words have different hemispheric specialization. It is assumed that the right hemisphere has a role in processing every kind of emotional word. The objective of the present study was the development of a Persian version of the dichotic emotional word test and evaluate its validation among adult Persian speakers. Materials and Methods: ...

متن کامل

A Mathematical Model of Historical Semantics and the Grouping of Word Meanings into Concepts

A statistical analysis of polysemy in sixteen English and French dictionaries has revealed that, in each dictionary, the number of senses per word has a near-exponential distribution. A probabilistic model of historical semantics is presented which explains this distribution. This mathematical model also provides a means of estimating the average number of distinct concepts per word, which was ...

متن کامل